Transcriptomic Analysis Identifies A Tolerogenic Dendritic Cell Signature

Dendritic cells (DC) are central to regulating innate and adaptive immune responses. Strategies that modify DC function provide new therapeutic opportunities in autoimmune diseases and transplantation. Current pharmacological approaches can alter DC phenotype to induce tolerogenic DC (tolDC), a maturation-resistant DC subset capable of directing a regulatory immune response that are being explored in current clinical trials. The classical phenotypic characterization of tolDC is limited to cell-surface marker expression and anti-inflammatory cytokine production, although these are not specific. TolDC may be better defined using gene signatures, but there is no consensus definition regarding genotypic markers. We address this shortcoming by analyzing available transcriptomic data to yield an independent set of differentially expressed genes that characterize human tolDC. We validate this transcriptomic signature and also explore gene differences according to the method of tolDC generation. As well as establishing a novel characterization of tolDC, we interrogated its translational utility in vivo, demonstrating this geneset was enriched in the liver, a known tolerogenic organ. Our gene signature will potentially provide greater understanding regarding transcriptional regulators of tolerance and allow researchers to standardize identification of tolDC used for cellular therapy in clinical trials.


INTRODUCTION
Dendritic cells (DC) represent a population of bone marrow (BM)-derived cells responsible for the collection and presentation of captured antigen (Ag) (1). DC are found throughout the body, and their capacity for Ag presentation provides a crucial link between innate and adaptive immune responses. Multiple DC subsets have been described, broadly divided into myeloid and plasmacytoid groups (2). Similar to other immune cells, DC are also able to alter their phenotype and function based on environmental cues (3), contextual inflammatory signaling, and the presence of self/non-self Ag. Classically, mature DC drive effector T cell responses, and immature DC mediate central or peripheral tolerance primarily through immunoregulatory factors that induce regulatory or anergic T cells (4). An additional subset that are maturation-resistantso-called tolerogenic DC (tolDC)can be manufactured ex vivo but have not yet been found physiologically. TolDC have been extensively interrogated in pre-clinical models, and are exceedingly effective at limiting host immune responses that drive autoimmune disease [summarized in (5)] or allograft rejection in transplantation [summarized in (6)]. Capitalizing on their ability to modulate T and/or B cell behavior and release immunomodulatory molecules, tolDC have been used in recent phase I/II clinical trials for type 1 diabetes (7), rheumatoid arthritis (8), multiple sclerosis (9), and liver and kidney transplantation (10) as therapeutic agents that reduce exposure to non-specific immunosuppressive drugs.
Multiple protocols for the generation of tolDC exist (11). BMderived progenitors (animals) and CD14+ peripheral blood mononuclear cells (PBMC, humans) are driven towards prototypic DC using growth factor/cytokine cocktails, and then "tolerized" pharmacologically. Interleukin-10 (IL-10) and vitamin D-based regimens are most frequently used, a substantial list of pharmacological modifiers of DC function exists (12) which continues to expand (6). Avoiding ex vivo isolation and manipulation, in vivo modulation using DCspecific targeting techniques, such as nanoparticles (13,14) or antibodies (15), can directly deliver a pharmacological payload. Despite treatment heterogeneity, the DC phenotype is characterized by immunoregulatory properties (16) which then assumes generation of stable tolDC.
Identification of DC subsets is typically based on cell-surface markers. Although expression appears relatively conserved between species, tissues and disease models (2), the same standardized characteristics are not yet available for tolDC. Indeed, tolDC used in recent clinical studies did not have uniform methods for generation, phenotype or functional measurements (17). To date, there is no consensus for "goldstandard" validation of tolerogenic properties, and current methods range from analysis of cell-surface markers to allogeneic T cell stimulation (10). This has significant implications for clinical trials where differences in tolDC generation may impact clinical outcomes. There is also ongoing concern that tolDC are not stably manipulated and, like regulatory T cells, can be subverted to activated or inflammatory forms by a permissive microenvironment. Understanding gene changes that robustly reflect tolDC would be a useful tool in standardizing their generation, which may ultimately impact patient outcome.
Transcriptomic analysis allows for the identification of conserved and differentially expressed genes in tolDC regardless of the method of generation. A specific transcriptomic signature may also assist with discovery of surrogate markers that may be used clinically. The adaptation of differentially expressed genes to enrichment pathways also provides insight into the biological interpretability of gene(s) of interest. Recent literature (18) seeking to bridge this gap in the literature are limited to consolidating already reported signatures of previous studies and drawing on published conclusions to extract a transcriptome unique to the tolDC phenotype. We have addressed this shortcoming by analyzing available datasets to yield an independent set of differentially expressed genes within each study. Comparing these results across datasets yielded a common tolDC transcriptome which we then validated. We used the same pipeline to generate a mature DC transcriptome, and both novel gene signatures were applied to immune cell populations in vivo.

TolDC Data Acquisition
A search to identify publicly available gene expression data in the Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/ geo/ was performed using the terms: "tolerogenic dendritic cell", "regulatory dendritic cell" and "tolDC". The search for publications up to December 2020 revealed 136 Datasets, of which 98 were human. Datasets were initially excluded from downstream analysis if they did not have an immature DC phenotype (control) within the dataset. Only 24 were whole datasets, and 8 contained cell samples that included adequately phenotyped tolDC ( Figure 1A). These datasets were arbitrarily divided into two groups: 5 datasets were used for initial tolDC gene set discovery, and the 3 remaining were used for validation. One further validation dataset was obtained from ArrayExpress (19).

Data Analysis
The raw data of each of the five datasets precured [GSE13762 (20), GSE23371 (21), GSE56017 (22), GSE117946 (23), GSE52894 (24)] were obtained from the gene expression omnibus (https://www.ncbi.nlm.nih.gov/geo/). All five datasets were normalized using the quantile normalization method, with each dataset filtered to exclude genes with nil expression. Within each dataset, differential gene expression analysis was performed using limma (Smyth G. K. 2004) with Benjamini-Hochberg multiple testing correction (p < 0.05). In this way, a moderated test statistic was calculated for each gene within each dataset. Moderated test statistics were converted to z-scores, and subsequently p-values, as described in the directPA vignette (25). Pearson's method of combining p-values was used to derive an overall significance score for each gene across all datasets ( Figure 1B). An overall significance score of p < 0.00001 was used as the threshold to establish genes in the tolDC transcriptome.

TolDC Gene Signature Validation
Three (3) datasets acquired from GEO (GSE104438 (26), GSE98480 (27), GSE92852 (28) containing tolDC and immature DC gene expression data were used for validation. A final validation was also performed using data from ArrayExpress database (E-MTAB-6937 (19). As with our discovery and initial validation set, we analysed each dataset individually to diminish potential batch effects that would arise from merging datasets. In all datasets, the moderated test statistics for each gene were converted into z-scores (as outlined in Figure 1B) and the pattern of gene expression compared with our tolDC gene set.

Alternatively Activated Dendritic Cell Gene Signature
In a similar manner to the identification of genes critical to tolDC, we determined genes differentially expressed between the tolDC stimulated with and without lipopolysaccharide (LPS). Three datasets were used in the analysis: GSE23371 (21), GSE117946 (23), GSE52894 (24). Differential gene expression was performed using the limma pipeline optimized as above, combining the results of our analyses using Pearson's Method, and yielding a set of genes critical to defining AADC.

Analysis of Enriched Pathways
A Wilcoxon rank sum test was performed on the combined pvalue that was determined for each gene within our gene set analysis, returning a significance value for KEGG pathways that were enriched in the DC of interest. A subsequent Gene Set  Enrichment Analysis (GSEA) was performed on the ranked list of genes, executed using the clusterProfliler (29) package in R.

Signature Validation
We sought to validate the specificity of our mature and tolDC signature using in vivo datasets that contained mononuclear phagocytes (MNP), including recognized DC subsets (30,31) or peripheral blood immune cell subsets [GSE28492 (32)]. RNAseq data was normalized using the TMM method without filtering, microarray data was normalized using quantile normalization, and gene expression was compared between each cell phenotype.

Single Cell RNAseq of Kidney, Liver, and PBMC Datasets
Five individual single cell RNAseq (scRNA-seq) samples were obtained from the Panglao database (https://panglaodb.se/). The search criteria were initially limited to liver tissue only from human donors. The accession code SRA716608 was used to extract scRNAseq into R for analysis. The five samples were normalized and integrated using the harmony algorithm. The combined dataset was then analysed using the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction technique. The tolDC phenotype was then plotted on the UMAP projection. To compare tolDC and mature DC gene signatures in different tissue compartments, liver (SRA716608, n = 22154 cells), peripheral blood mononuclear cells (PBMC, SRA749327, n = 15881 cells) and kidney cortex (SRA598936, n = 3573 cells) scRNA-seq samples were also acquired. Datasets belonging to individual tissue types were integrated using the harmony method, normalized and scaled. The expression of genesets was measured between DC in each tissue type.

Data Availability and Code Statement
Data utilized for this study is publicly available using the GEO accession codes listed. The code utilized to generate analysis and figures is available at: https://github.com/Harry25R/ Transcriptomic-analysis-identifies-a-tolerogenic-dendritic-cellsignature.git.

Dataset Quality Control
Five complete datasets with tolDC gene sequencing were retrieved. Each dataset had a different method of tolDC generation and 3 studies shared the same sequencing platform ( Table 1). A principal component analysis (PCA) identified phenotypic specific differences between samples in the GSE52894 dataset (24) (Figure 2A). This was consistent across all included datasets (Supplementary Figure 1A). Across the first principal component we observed large differences when DC were matured with LPS. The largest source of variation was between tolerogenic and mature DC, an expected result given the regulatory nature of tolDC compared to mature (immunogenic) DC. Confirming these results, unsupervised hierarchical clustering between samples exhibited strong correlation between samples of the same phenotype (Supplementary Figure 1B).

Establishing a tolDC Gene Signature
The results of individual differential gene expression analysis were ranked by p-value. The top 10 up-regulated and downregulated genes are listed in Tables 2A, 2B, respectively. Our results were consistent with previous reports, suggesting no homogeneity in differentially expressed genes DEG between different methods generating tolDC if only looking at the strongest changes (18). By considering more than just the top genes, we then assessed homogeneous differential gene expression across the datasets, identifying 53 genes with a combined p-value<10 -5 which we deemed to be characteristic of tolDC ( Table 3). The top 20 DEG are displayed in heatmap form ( Figure 2B).

TolDC Pathway Enrichment Analysis
Mapping DEG within the tolDC gene set to the KEGG database returned several enriched pathways ( Figure 2C). The mitogenactivated protein kinase (MAPK) pathway was significantly enriched, as were cyclic AMP, Ras-related Protein 1, Forkhead box O and tumor necrosis factor (TNF) pathways. Gene Set Enrichment Analysis (GSEA) assigned directional change to each pathway and ranked genes were then mapped against the Gene Ontology (GO) database. Encouragingly, pathways involved in antigen presentation and antigen binding were all suppressed ( Figure 2D), consistent with literature demonstrating that tolDC negatively regulate the immune response.

TolDC Gene Set Validation
Based on the initial discovery set, we identified 3 appropriate gene sets for validation (Table 4), annotating each gene by the expected enrichment direction ( Figure 3A). Our gene signature fit data from TLR-and interluekin-10-generated tolDC, although GM-CSFgenerated tolDC performed poorly in this validation step. We conducted further validation of our tolDC gene set using data   Figure 3B).

Alternatively-Activated tolDC
Propagated tolDC that are "alternatively activated" (AADC) by exposure to an inflammatory stimulus, typically LPS, also demonstrate robust regulatory properties that protect against graftversus-host disease (33,34). AADC have shown greater efficacy in controlling inflammatory immune responses in vivo (35) compared to a more modest effect from IL-10-conditioned tolDC (36). We initially interrogated three datasets that compared gene expression between AADC and tolDC, although these demonstrated different DEG ( Figure 4A, Table 5). Analysis determined 39 DEG that were enriched in AADC compared to tolDC ( Table 4 and Figure 4B), and we mapped these to GEA pathways ( Figures 4C, D).

DC Signatures in Tissue
The liver is unique amongst solid organs in its capacity to modulate local and systemic tolerance. This is contributed to by the presence of unconventional antigen presenting cells (liver sinusoidal endothelial cells, Kupffer cells) (37), altered T cell proportions (particular gd subsets) (38,39), and an increased ratio of DC to parenchymal cells (2-5 times higher in liver compared to other organs) (40). Importantly, liver-resident DC demonstrate features most consistent with a tolerogenic phenotype and function, with low endocytic capacity, decreased MHC expression, limited T cell allostimulation and high IL-10 production (41)(42)(43). Using scRNAseq samples from healthy human liver which has been clustered by cell type ( Figure 5A, Supplementary Figure 2A), we then demonstrated that upregulated genes within the tolDC signature was enriched in areas which mapped to DC/monocyte/macrophage lineage within the liver (Figures 5B-D). Downregulated genes were not overexpressed in any cell type (Supplementary Figure 2B). We also interrogated whether our tolDC signature was overexpressed in the kidney (which has significantly lower tolerogenic capacity) and/or PBMC. We were able to demonstrate that our gene signature was not enriched in either compared to liver ( Figure 5E), although an analysis of housekeeping genes (44) was not significantly different (Supplementary Figure 2C).

The Relevance of DC Gene Signatures In Vivo
DC are rare populations within the peripheral blood (45), but reside at greater frequency within tissue interstitial compartments in an immature state, and sample the environment in organs exposed to potential (neo-)antigens in lung (46,47), kidney (48), and skin (49,50). The potential for exogenous stimuli to initiate DC activation suggests that the mature DC gene signature might be enriched in tissue-specific DC subsets in vivo. A total of 64 genes were significantly differentially expressed between the mature and immature DC, and the top 52 genes were heatmapped ( Figure 6A). The enrichment analysis yielded pathways relevant to cell inflammation and infection ( Figures 6B, C). Mature DC are well-defined in the literature, and the correlation with an inflammatory gene signature demonstrates the reliability of our pipeline to resolve genes according to DC phenotype, as well as supporting the current hypothesis that DC are influenced by the surrounding environment (3).
To further demonstrate the physiological relevance of our DC gene signatures, we used a dataset identifying 6 myeloid cell subsets (31), demonstrating that our mature DC gene set correlated with the appropriate (mature) DC subset identified in vivo ( Figure 6D). This also shows our approach to identifying a cell-specific gene signature on microarray platforms could be successfully applied to RNAseq data. Interestingly, the tolerogenic and mature DC gene sets could also be applied to distinct immune cell subsets within peripheral blood (32), with the latter enriched in myeloid DC (mDC) and monocytes ( Figure 6E). We applied our mature DC signature to liver, kidney and PBMC scRNAseq samples, demonstrating significantly lower expression in liver ( Figure 6F). Kidneyresident DC and PBMC showed an enhanced mature DC signal compared to tolDC ( Figure 6G). We also interrogated a recent dataset comparing the expression profiles of mononuclear phagocytes (MNP) isolated from epidermal and dermal tissue (30). The expression of our mature, but not tolerogenic, DC signature was significantly higher in recognized DC subsets ( Figure 6H and Supplementary Figures 3A, B).

DISCUSSION
Here we derive novel, distinct genetic signatures for both tolDC and mature DC. Both gene sets align with known biological differences in phenotype and function, and can be used to identify physiological DC subsets in vivo. Most interesting was the mapping of the tolDC signature to liver DC. Our analysis also demonstrated that tolDC and immature DC are distinct subsets, despite current paradigms suggesting overlap of several features (51), and these data support the notion that tolDC indeed derive from specific transcriptional programming. We identified several genes critical to tolDC function. Several compartments of the CD1 glycoprotein complex were downregulated in the tolDC gene set. CD1 is a cell surface protein that is involved in presentation of lipid-based antigens to T-cells and natural killer cells that subsequently mediate adaptive immunity (52,53). CD1 autoreactive T-cells, particularly CD1a and CD1c, are abundant among circulating T-cells from healthy human adults and neonates (54) and are associated with a variety of diseases. The plasticity of CD1 antigen presentation highlights evolved mechanisms that regulate the self/non-self cellular lipid environment presented to Tcells. With CD1a-c expression decreased in the tolDC we can speculate defective T cell stimulation ability due to altered antigen processing and presentation (55). This finding has also been replicated in tissueresident CD103+ conventional DC which were less effective in antigen cross-presentation with accumulated lipid bodies (56).
CD14, a known monocyte cell-surface marker in blood, is expressed by tissue-based macrophages, and was significantly upregulated in tolDC. CD14 has several functions on the surface of monocytes, ranging from metabolism to pathogen-associatedmolecular pattern (PAMP) identification in the innate immune response (57). CD14 binds to extracellular LPS and acts as a secondary receptor to TLR4 in facilitating a subsequent immune response (58). However, recent data has demonstrated that DC subsets expressing CD14 impeded T-cell proliferation (59). Interestingly, CD14 and CD1a kinetics are replicated in human monocyte-derived DC whose maturation capacity are limited by co-culture with immune complexes (60).

Upregulated Genes
Alternatively-activated DC (AADC), tolerogenic DC activated by inflammatory stimuli, are effective in inducing anergic and regulatory T cell responses (34) that protects against lethal graft-versus-host-disease in pre-clinical models (33). Only 3 comparative datasets were available for analysis and did not demonstrate homogeneity between DEG from AADC and tolDC. Gene enrichment analysis demonstrated increased virus and stress responsiveness, as well as cytokine-signaling/inflammatory pathways, with concurrent downregulation of mitochondrial function. Metabolic plasticity, including enhanced catabolism, has been correlated with DC function, and our findings correlated with previous work demonstrating decreased oxidative phosphorylation capacity with LPS-stimulated tolDC (24).
TolDC are artificially generated in vitro, and therefore not wholly representative of DC found physiologically. However, natural and induced DC with tolerogenic capacity (71) are crucial for homeostatic function, particularly in tissues exposed to environmental stimuli. The liver is considered the most tolerogenic organ, and our tolDC gene signature was overrepresented in four integrated scRNAseq datasets of healthy human liver, clustering with liver-resident DC (with overlap seen in the macrophage/monocyte population). DC and macrophages are interrelated, derive from common lineages, and are often phenotypically and functionally indistinguishable (51).  Hepatic DC are distinct from other tissue-based DC (37,72), abundantly secreting immunosuppressive cytokines (41,73) that dictate immunoregulatory properties. We mapped the tolDC gene set to scRNAseq samples of healthy (and more immunogenic) kidney as a comparator, but the signature was not overexpressed, in keeping with clinical and experimental data that support organ-specific differences in allograft acceptance (74). Our pipeline generating a tolDC transcriptomic signature was applied to developing a gene set relevant to mature DC. Genes deemed significant to mature DC were strongly implicated in the inflammatory response and, using the KEGG database, mapped to TNF-a and NF-kB signaling pathways. NF-kB is a central mediator of pro-inflammatory gene induction and functions in both innate and adaptive immune cells, and central for DC maturation (75). We were able to demonstrate that our mature DC gene signature was enriched in CD1c+ mature DC rather than CLEC9A+ immature DC. These findings, while not novel, speak to the validity of our methods in characterizing DC phenotype using gene expression datasets, and demonstrate that our signature could be applied to physiological DC in vivo. This paper further highlights the need for further -omic studies to identify a consensus gene expression profile, including distinct signaling pathways, that can confirm tolDC function and stability in vivo. Despite the reported safety of tolDC in early-phase human trials (17), and known efficacy in large animal models (76), potential variability in clinical grade tolDC preparations remains a concern for translational purposes. The advent of standardized tolDC manufacturing through Focus and Accelerate Cell-based Toleranceinducing Therapies (77) aims to minimize variations in approach and is a key step towards a standardized tolDC production for pre-clinical studies and clinical trials. Understanding the genomic processes behind the functional properties of DC and identification of molecular targets of immunomodulation provide potential opportunities for intervention to silence unwanted immune responses.